Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New partitioner: KD-Tree partitioner #127

Open
wants to merge 49 commits into
base: master
Choose a base branch
from

Conversation

abualia4
Copy link

@abualia4 abualia4 commented Aug 8, 2019

General

New partitioning technique based on balanced KD-tree was developed.

coverKD method is added to BoxEnvelope.scala file, (spark3D/src/main/scala/com/spark3d/geometryObjects/).
Also I made minor modifications as follows:

  • Partitioner.scala file, (/home/alia/spark3D/src/main/scala/com/spark3d/Partitioners.scala).
  • Repartitioning.scala, (/home/alia/spark3D/src/main/scala/com/spark3d/Partitioners.scala).
  • GridTypes.scala, (/home/alia/spark3D/src/main/scala/com/spark3d/utils/GridType.scala).
  • run_viz_scala.sh, to visualize the KD-tree partitioner, (spark3D/).

New added code

The main files for the new Partitioner (KD-tree) are as follows:

  • KDtree.scala, (spark3D/src/main/scala/com/spark3d/spatialPartitioning/).
  • KDtreePartitioning.scala, (spark3D/src/main/scala/com/spark3d/spatialPartitioning/).
  • KDtreePartitioning.scala, (spark3D/src/main/scala/com/spark3d/spatialPartitioning/).
  • Test.scala, (spark3D/src/main/scala/com/spark3d/examples/)
  • arun_scala.sh, (spark3D/).

Evaluation of the new code

In order to evaluate the performance of the new spatial partitioning technique which is based on balanced KD-tree, this methodology took empirical approach by comparing it with onion and octree partitioners which were implemented in the spark3D framework, after applying them on two astronomical data sets. These data sets have been generated by CoLoRe fast simulation. The first dataset has more than 37 million 3D points, and data has been randomly drawn along spherical coordinates. The second dataset contains more than seven and a half millions of 3D points, and data has been randomly drawn along Cartesian axes. All implementations were run on the LAL Apache Spark cluster hosted in the virtual data cloud.

Partitioner   Spherical Dataset Cartesian Dataset
-- Number of objects/points 37,079,120 7,768,425
KD-tree Standard Deviation (#of objects) 2,536 6,652
KD-tree Computational Time (Minute) 2.76 1.27
Octree Standard Deviation (#of objects) 276,712 39,317
Octree Computational Time (Minute) 3.7 1.45
Onion Standard Deviation (#of objects) 54,195 38,006
Onion Computational Time (Minute) 15.68 2.65

@JulienPeloton JulienPeloton self-requested a review August 8, 2019 12:33
@JulienPeloton
Copy link
Member

Hi @abualia4, thanks for the PR. Sorry it took me a long time to review it, but I will add my comments now. Please read them, and implement change if any.
I hope you and your family are safe.
Julien

Copy link
Member

@JulienPeloton JulienPeloton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @abualia4. Could you please have a look at the comments and implement the suggestions for change in order to merge the PR.

fitsfn=
MASTERURL=spark://134.158.75.222:7077
fitsfn=hdfs://134.158.75.222:8020//lsst/LSST1Y/out_srcs_s1_0.fits

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general rule is to not commit URL in a repo. Please keep the previous empty variables:

MASTERURL=
fitsfn=

target/scala-${SCAlA_VERSION}/spark3d_${SCAlA_VERSION}-${VERSION}.jar \
$inputfn "position_x_mock,position_y_mock,position_z_mock" false "octree" 512 0.001 "username" "api_key"
$inputfn "position_x_mock,position_y_mock,position_z_mock" false "kdtree" 512 0.001 "abualia4" "GFz0cUDumcKcnhpMd8qw"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the general rule is to never push sensitive information in a repo! Could you remove your username and api key?

# --jars ${SP},${HP} --packages ${PACK} \
# --class com.astrolabsoftware.spark3d.examples.VisualizePart \
# target/scala-${SCAlA_VERSION}/spark3d_${SCAlA_VERSION}-${VERSION}.jar \
# $inputfn "position_x_mock,position_y_mock,position_z_mock" false "octree" 512 0.001 "abualia4" "GFz0cUDumcKcnhpMd8qw"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the general rule is to never push sensitive information in a repo! Could you remove your username and api key?

@@ -0,0 +1,141 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give a meaningful name for this file? test.scala is too vague, and something like TestKDTree.scala would be better.

var minX: Double, var maxX: Double,
var minY: Double, var maxY: Double,
var minZ: Double, var maxZ: Double)
extends Serializable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of this class? it is not used elsewhere, so I would recommend to remove this file from the repo.

@@ -84,7 +81,7 @@ class OctreePartitioner (octree: Octree, grids : List[BoxEnvelope]) extends Spat
* @return list of Tuple of containing partitions and their index/partition ID's
*/
override def getPartitionNodes[T <: Shape3D](spatialObject: T): List[Tuple2[Int, Shape3D]] = {

println("getPartitionNodes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unnecessary println

@@ -22,7 +22,7 @@ import scala.collection.mutable.ListBuffer

class OctreePartitioning (private val octree: Octree)
extends Serializable {

println("OctreePartitioning")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unnecessary println

@@ -40,7 +40,7 @@ import com.astrolabsoftware.spark3d.geometryObjects.Point3D
*
*/
class OnionPartitioner(grids : List[ShellEnvelope]) extends SpatialPartitioner(grids) {

println("OnionPartitioner")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unnecessary println

@@ -31,7 +31,7 @@ import com.astrolabsoftware.spark3d.geometryObjects.Shape3D._
*
*/
class OnionPartitioning extends Serializable {

println ("OnionPartitionong")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unnecessary println

@@ -0,0 +1,81 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, remove this unnecessary file from the repo.

else
println("Please determine the number of partitions as 1, 2, 4, 8 and so on")
}
else For others
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think else For others is a correct Scala statement... You probably miss a # before For

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, this is why the continuous integration is failing.

@mayurdb
Copy link
Contributor

mayurdb commented Mar 19, 2020 via email

@JulienPeloton
Copy link
Member

Thanks @mayurdb ! I hope you are in a safe place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants